Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large CompositeSubscription performance improvements #1145

Closed

Conversation

akarnokd
Copy link
Member

@akarnokd akarnokd commented May 5, 2014

This is a proposition to improve the addition/removal speed of the CompositeSubscription in case of thousands of items in it.

In some operators, especially when using Schedulers and/or observeOn, thousands of items may be present in the composite and since adding/removing a new item is O(n), it takes more and more time to add and remove items once the composite gets large.

My proposal is to change the composite state implementation to switch to a HashSet representation above a certain threshold (and switch back below some). The following diagram shows some benchmarking results. (Configuration: i7 920 @ 2.4GHz, 32K L1 Data, 256K L2, 8M L3, 6GB DDR3 1333MHz RAM, Windows 7 x64, Java 8u05 x64.)

image

This benchmark how the CompositeSubscription behaves when there are some items already in it and a new unique item is added and removed immediately (all single threaded). The blue line indicates the current master implementation; the red, green and purple show the new implementation with thresholds set to 8, 16 and 24 respectively. Once the internal array size reaches the cache-line size, it is generally better to use HashSet instead.

The second benchmark compares how fast can the CompositeSubscription be filled with subscribers to a various capacity, and how the target size affects the fill speed.

image

When the composite is filled in, the one-by-one array resize performs better until a larger capacity is reached, but then again, using a HashSet to append further items is faster. Unfortunately, what seems to be an optimal threshold for the first case performs worse in this case.

There are some drawbacks of this hybrid approach:

  • The optimal threshold value depends on the use case and the target system. How can we enable to tune this parameter. Parameter tuning could be handy in other places in RxJava; how can we enable it?
  • I'm not 100% certain if I've correctly implemented the switchover from atomic states to mutable state and back.

@cloudbees-pull-request-builder

RxJava-pull-requests #1066 SUCCESS
This pull request looks good

@benjchristensen
Copy link
Member

I need more time to review this ... looks like impressive work and analysis!

Does this still make sense once #1000 is implemented and we can limit the buffering on operators like observeOn? It looks like the 1024 buffer size (a candidate for default buffer even with #1000) can still benefit from this.

@akarnokd
Copy link
Member Author

akarnokd commented May 5, 2014

It depends on the usage pattern. If there is a burst of 1000s of new subscriptions, a bigger threshold is better. If there are lots of small additions and removals while having a moderate number of items in the composite, the lower threshold is better. I don't know a case where suddenly 1000s of subscriptions get created and added to the composite.

@akarnokd
Copy link
Member Author

This change has been incorporated into #1190 so I don't recommend merging this. I'll keep this open until the approach is approved or discarded.

@benjchristensen
Copy link
Member

I tested these changes while playing with performance. As expected they don't play a role in normal map/flatMap use cases using this test: https://github.com/benjchristensen/RxJava/blob/performance/rxjava-core/src/perf/java/rx/usecases/PerfTransforms.java

master-performance branch

Benchmark                                       (size)   Mode   Samples         Mean   Mean error    Units
r.u.PerfTransforms.flatMapTransformsUsingFrom        1  thrpt         5  3184873.133   172320.420    ops/s
r.u.PerfTransforms.flatMapTransformsUsingFrom     1024  thrpt         5     9079.937      343.905    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust        1  thrpt         5  3411785.677    73767.161    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust     1024  thrpt         5    10860.963      294.309    ops/s
r.u.PerfTransforms.mapTransformation                 1  thrpt         5  7208334.997   703327.745    ops/s
r.u.PerfTransforms.mapTransformation              1024  thrpt         5    18720.797      278.529    ops/s

with CompositeSubscriptionSpeedup merged

Benchmark                                       (size)   Mode   Samples         Mean   Mean error    Units
r.u.PerfTransforms.flatMapTransformsUsingFrom        1  thrpt         5  3118494.120   173919.112    ops/s
r.u.PerfTransforms.flatMapTransformsUsingFrom     1024  thrpt         5     8715.267      298.710    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust        1  thrpt         5  3140414.540    17172.305    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust     1024  thrpt         5    10226.673      410.795    ops/s
r.u.PerfTransforms.mapTransformation                 1  thrpt         5  6913018.033   123193.805    ops/s
r.u.PerfTransforms.mapTransformation              1024  thrpt         5    18608.990      377.513    ops/s

reverted changes

Benchmark                                       (size)   Mode   Samples         Mean   Mean error    Units
r.u.PerfTransforms.flatMapTransformsUsingFrom        1  thrpt         5  3247111.390    63408.626    ops/s
r.u.PerfTransforms.flatMapTransformsUsingFrom     1024  thrpt         5     9103.650      223.604    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust        1  thrpt         5  3406320.200   121290.155    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust     1024  thrpt         5    10882.203      382.654    ops/s
r.u.PerfTransforms.mapTransformation                 1  thrpt         5  7353154.113   185266.016    ops/s
r.u.PerfTransforms.mapTransformation              1024  thrpt         5    17524.093      518.755    ops/s

@akarnokd
Copy link
Member Author

But mine is a bit slower than the master and the 0.16.x. I think the reason is that by changing the size of objects, the allocator may run out of the cache line at different times.

@akarnokd
Copy link
Member Author

I'm closing this because of the new subscription containers have these optimizations in separate classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants